IEEE Transactions on Biomedical Engineering
● Institute of Electrical and Electronics Engineers (IEEE)
Preprints posted in the last 7 days, ranked by how well they match IEEE Transactions on Biomedical Engineering's content profile, based on 38 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.
Burke, K. M.; Calcagno, N.; Mandepudi, S.; Premasiri, A.; Hall, K. C.; Vieira, F. G.; Berry, J. D.; Straczkiewicz, M.
Show abstract
Wearable digital health technologies may complement traditional gait assessments in amyotrophic lateral sclerosis (ALS) by sensitively capturing real-world mobility changes. In this study, we validated six digital gait metrics derived from ankle-worn sensors in a natural history cohort of 182 individuals with ALS. Investigated metrics correspond to various aspects of gait, including volume, speed, intensity, similarity, variability, and fragmentation. Longitudinal analyses showed significant declines in step count, peak cadence, stride intensity, and stride similarity, with increasing stride duration variability and walking fragmentation over 52 weeks. Many participants exhibited greater relative change in the gait metrics than the self-reported ALS Functional Rating Scale-Revised (ALSFRS-RSE). Stratified analyses revealed that digital metrics captured significant functional decline even in participants with stable walking scores on the ALSFRS-RSE. These findings support the potential utility of these metrics for disease monitoring in ALS clinical care and trials.
Zhao, J.; Ahmadi, S.-A.; Decker, J.; Zwergal, A.; Eulenburg, P. z.; Flanagin, V. L.; Wuehr, M.
Show abstract
Quantitative eye movement analysis is important for neuro- logical diagnostics, yet existing video-oculography (VOG) systems typ- ically require calibration, device-specific settings, or accurate gaze la- bels. We present VOGeo-Gaze, a real-time, calibration-free, geometry- aware neural network that estimates gaze by reconstructing anatomi- cally meaningful eyeball parameters from image features. The method combines segmentation-driven projection geometry, a refraction-aware pupil correction module, and temporal anatomical stabilization, so gaze is derived from interpretable eye geometry rather than direct angular regression. Trained only on the public TEyeD dataset with weak gaze supervision, VOGeo-Gaze was evaluated on 116 clinical recordings from 17 patients and 19 healthy subjects using EyeSeeCam, a clinical gold- standard VOG system. It achieved median absolute angular errors of 0.33{whitebullet} horizontally and 0.35{whitebullet} vertically, with nearly 92% of recordings below 1{whitebullet} error while operating at >300 FPS. These results demonstrate sub-degree clinical gaze estimation without subject-specific calibration, camera intrinsics, or accurate gaze labels, providing a scalable and inter- pretable alternative to conventional VOG pipelines. Code is available at https://github.com/DSGZ-MotionLab/VOGeo-Gaze.
Shah, M.
Show abstract
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease affecting more than 450,000 individuals worldwide and is frequently diagnosed more than 12 months after symptom onset, delaying intervention during a critical early window. Because up to 80% of patients develop dysarthria within two years, subtle changes in speech provide a signal of early bulbar motor neuron degeneration. However, existing speech-based systems rely on supervised classification trained on limited datasets, achieving moderate sensitivity and depending heavily on labeled disease examples, which restrict scalability and early detection. This study introduces SPEAK-NORM, the first-ever normative speech modeling framework for early ALS diagnosis, which learns age- and sex-conditioned motor-speech distributions exclusively from healthy individuals. A conditional variational autoencoder models coordination of hypoglossal, laryngeal, and respiratory motor pathways, and deviation from this healthy manifold is quantified through latent representations and reconstruction error to form a 354-dimensional profile. A calibrated linear Support Vector Machine performs subject-level classification under subject-disjoint validation. On the VOC-ALS database (n = 153), SPEAK-NORM achieves 98% accuracy with balanced sensitivity and specificity, significantly outperforming established clinical acoustic indices and prior systems. The framework maintains strong performance under cross-task generalization and when retrained on healthy controls in independent dementia and Parkinson disease cohorts, demonstrating disease-specific deviation patterns rather than generic neurodegenerative change. Spectral, temporal, and latent separations further support interpretability. By modeling healthy speech instead of memorizing disease examples, SPEAK-NORM enables scalable early neuromotor screening using recording devices, with potential to support earlier diagnosis, differential classification, and monitoring of ALS progression.
Hsiao, C.; Cheng, Y.-R.; Yang, C.-Y.; Hsu, F.-S.
Show abstract
Subjective auditory-perceptual evaluation and uninterpretable deep learning models limit the clinical assessment of voice disorders. This study proposes a two-phase zero-shot framework to evaluate voice pathology. First, an Audio Spectrogram Transformer is fine-tuned on the Perceptual Voice Quality Database to generate an acoustic latent space. Second, Orthogonal Procrustes analysis maps these acoustic embeddings directly onto the semantic space of a pre-trained Sentence Transformer. The geometric alignment produced continuous semantic axes that outperformed a supervised machine learning baseline in regressing clinician-rated GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) severity scales. Furthermore, these axes correlate with traditional acoustic measures, including Harmonics-to-Noise Ratio and local jitter, while remaining robust when applied to aperiodic signals by not requiring fundamental frequency extraction. Most importantly, the model achieved zero-shot semantic expansion, successfully evaluating voices using an untrained, natural clinical vocabulary beyond the GRBAS scale. External validation on the Voice ICarus Database confirmed cross-corpus stability and demonstrated the capacity for zero-shot differential phenotyping of specific etiologies, such as hypokinetic dysphonia and reflux laryngitis. By bridging acoustic and semantic latent spaces, this framework offers an objective, continuous, and transparent metric for evaluating voice quality using voice descriptive vocabulary.
Jones, G.; Otsuka, K.; Fujisawa, N.; Yamaura, H.; Matsumoto, K.; Okamoto, A.; Yamaguchi, T.; Shimada, T.; Kagawa, S.; Yamazaki, T.; Akasaka, T.; Bouma, B. E.; Villiger, M.; Fukuda, D.
Show abstract
Background: Quantitative lipid assessment is central to identifying rupture-prone coronary plaques and represents a therapeutic target for lipid-lowering therapy. Near-infrared spectroscopy (NIRS)-derived lipid core burden index (LCBI) is well validated and widely used for detecting lipid-rich lesions. Optical frequency domain imaging (OFDI) is increasingly adopted for guiding percutaneous coronary intervention (PCI) due to its high-resolution structural imaging capabilities. Depolarization-sensitive OFDI (depOFDI) provides intrinsic lipid contrast and may enable combined structural and compositional plaque characterization within a single OFDI-based platform. Objective: To define an OFDI-derived lipid metric and evaluate its agreement with NIRS-derived LCBI. Methods: Thirty-three patients underwent both polarization-sensitive OFDI and NIRS-intravascular ultrasound imaging during PCI. After exclusion of 4 datasets, 29 co-registered pullbacks were analyzed. A signal-to-noise-corrected depolarization metric was used to identify lipid-rich regions and generate depOFDI chemograms. maxLCBI4mm value and location, as well as total LCBI, were computed and compared with NIRS. Results: depOFDI demonstrated strong agreement with NIRS, showing high correlation for maxLCBI4mm (r^2 = 0.862) and total LCBI (r^2 = 0.867), along with strong spatial concordance for the location of the maxLCBI4mm (r^2 = 0.900). Bland-Altman analysis of LCBI4mm showed minimal bias (10.7) with 95% limits of agreement of [81.4 to 102.8]. Conclusions: depOFDI enables accurate quantification of lipid burden alongside the high-resolution structural information inherently provided by OFDI. Because depolarization metrics can be derived from polarization-diverse detection available in many commercial OFDI systems, this approach provides a practical pathway toward comprehensive plaque characterization within existing PCI workflows, without the need for additional imaging modalities.
Rezaeitaleshmahalleh, M.; Masoumi, S.; Debalme, E.; Sundt, T. M.; Aranki, S. F.; Shin, B.; Nezami, F. R.
Show abstract
Background: Coronary artery bypass grafting (CABG) remains the standard of care for complex multivessel and left main coronary artery disease. However, current preoperative planning remains largely subjective, relying on qualitative interpretation of coronary CT angiography (CCTA), operator-dependent stenosis grading, and fragmented multi-software workflows. Invasive fractional flow reserve (FFR), the reference standard for physiologic lesion assessment, is infrequently acquired preoperatively, leaving distal anastomosis planning without an objective hemodynamic basis. Methods: We developed a fully automated, AI-powered platform that converts routine CCTA into a patient-specific CABG planning workflow through five integrated modules: nnU-Net based segmentation of coronary lumen and calcification; quantitative morphological and topological characterization generating more than thirty descriptors; automated stenosis detection using a local reference-radius formulation; a nine-point composite scoring framework for distal anastomosis site selection incorporating luminal caliber, landing-zone length, calcification burden, distal perfusion reserve, and bifurcation proximity; and interactive virtual graft construction coupled to a distributed reduced-order solver for pre- and post-bypass FFR estimation. Results: Lumen segmentation achieved a mean Dice similarity coefficient of 0.96 {+/-} 0.01, whereas calcium segmentation achieved 0.73 {+/-} 0.15 on the held-out cohort. Platform-derived FFR demonstrated strong agreement with invasively measured FFR (r=0.96, mean absolute relative difference 1.73 {+/-}1.42%) across the evaluated lesions, supporting the physiologic validity of the reduced-order hemodynamic solver. End-to-end analysis from raw CCTA to hemodynamic assessment and virtual graft planning was completed in approximately seven minutes per case on a standard workstation, representing a substantial reduction in processing time compared with conventional multi-tool and CFD-based workflows. Conclusions: The proposed platform demonstrates the feasibility of rapid, reproducible, and physiology-informed CABG planning using routine CCTA. By integrating anatomical characterization, automated target-site analysis, virtual graft construction, and reduced-order hemodynamic assessment into a single workflow, the framework provides objective, quantitative surgical decision support compatible with routine clinical workflows. Keywords: Coronary artery bypass grafting (CABG); Fractional flow reserve (FFR); Coronary CT angiography (CCTA); Surgical planning
Vanegas Mueller, E.; Joe-Oshodi, A.; Banerjee, A.; Villarroel, M.
Show abstract
Cardiovascular disease is the leading cause of death worldwide. Sudden cardiac death (SCD) accounts for roughly 50% of all cardiac deaths. The electrocardiogram (ECG) is widely used for early diagnosis of cardiac disease. However, the complexity of accurate interpretation limits the ECG's efficacy. Modern deep learning methods have been applied to assist clinicians in diagnosis. We applied Neural Architecture Search (NAS), an automated machine learning technique, to identify optimal deep learning architectures for classifying cardiac arrhythmias from ECGs. We applied the Differentiable Architecture Search strategy to an AutoFormer search space to identify optimal self-attention architectures for arrhythmia classification. We trained, validated, and tested the resulting model on the PhysioNet Challenge 2021 dataset (n = 88,253), comprising ECGs across three continents. We performed a hyperparameter optimisation on the NAS output, exploring input patch size, class weighting, and loss function. We evaluated performance using the PhysioNet Challenge metric and the area under the receiver operating characteristic curve (AUROC). The NAS converged towards minimal architectural configurations (embedding dimension: 384, depth: 4, self-attention heads: 4, MLP ratio: 1) with a validation challenge metric of 0.66 (PhysioNet Challenge 21 Winner: 0.63). The NAS-created network achieved an AUROC of 0.97 and a challenge metric of 0.71 during testing. Normal Sinus Rhythm and Sinus Tachycardia achieved AUROCs of 0.99. Low-QRS Voltage and T-wave abnormality were the worst-performing arrhythmias, with AUROCs of 0.89 and 0.90, respectively. We interpret that architectural simplicity drives performance in arrhythmia classification. Because SCD is unexpected, prevention strategies in free-living environments require lightweight computational resources suitable for wearable devices. Class imbalance fundamentally limits classification performance for rare arrhythmias such as Low-QRS Voltage and T-wave inversion, irrespective of hyperparameter choices. However, the self-attention mechanism can autonomously abstract clinical representations, simplifying clinical deployment by eliminating the need for an explicit feature-extraction pipeline.
Kurt, F.; Subasi, S. N.; Yakisan, E. S.; Subasi, A.
Show abstract
Background: Wearable technologies enable scalable and continuous monitoring of emotional states through passive sensing of physiological and behavioral signals. However, conventional learning approaches often struggle to model the complex temporal, contextual, and relational dependencies underlying human emotions. To address these limitations, we propose a graph-based framework that represents multimodal wearable observations as heterogeneous knowledge graphs enriched with semantic information derived from Large Language Models (LLMs), enabling richer contextual understanding beyond raw sensor measurements. Methods: We constructed a heterogeneous knowledge graph using multimodal Fitbit physiological signals and affective self-report data collected from 45 users. Framing mood prediction and emotion detection was formulated as both binary and ternary node classification tasks. We evaluated five baseline heterogeneous Graph Neural Network (GNN) architectures and compared them with the proposed Semantically Gated Augmented Graph Neural Network (SeGA-GNN) framework, which dynamically integrates LLM-generated semantic embeddings into graph representations through a gated cross-modal fusion mechanism. Results: The baseline GNN models achieved strong performance, with classification accuracies ranging from 0.7525 to 0.9739 for binary classification and 0.6249 to 0.9699 for ternary classification. The proposed SeGA framework consistently improved predictive performance across most architectures. In particular, semantic augmentation transformed the HAN model from moderate baseline performance into near-perfect emotion recognition capability, achieving SeGA-HAN Accuracy = 0.9988 and AUC = 1.0000 for binary classification and Accuracy = 0.9979 and AUC = 1.0000 for ternary classification. Discussion and Conclusion: Integrating LLM-derived semantic contextualization into heterogeneous graph learning enables effective modeling of contextual information that is not directly captured by wearable physiological signals alone. The proposed SeGA-GNN framework demonstrates that adaptive semantic fusion substantially improves the accuracy, robustness, and interpretability of wearable-based emotion detection. These findings establish a promising direction for next-generation wearable affective computing systems and intelligent emotion-aware applications.
Hofmeister, J.; Brina, O.; Rosi, A.; Bernava, G.; Reymond, P.; Muster, M.; Lovblad, K.-O.; Machi, P.
Show abstract
Background: Three-dimensional visualization and quantitative analysis of cerebral arteries on 3DRA are central to endovascular treatment planning, device selection, and cerebrovascular research. Manual segmentation is time-consuming and operator-dependent, yet no open-source deep learning model has been prospectively validated for this task on 3DRA. Methods: A nnUNet v2 model was trained for binary cerebral artery segmentation on 400 consecutive 3DRA acquisitions from three angiographic systems, comparing four configurations across architectures and loss functions. The best-performing configurations were prospectively validated on 40 patients using a dual approach: quantitative metrics (DSC, clDice, HD95, ASD, Precision, Recall), and blinded expert qualitative evaluation by two interventional neuroradiologists assessing 12 arterial segments, a global quality score, and clinical usability across 40 test cases. Results: The ensemble model achieved median DSC 0.917, clDice 0.932, and HD95 1.494 mm. Global quality scores were significantly lower for nnUNet v2 than for expert segmentations (median 4 vs 5, p<0.001), but nnUNet v2 segmentations were rated clinically usable in 88-90% of cases versus 95-98% for expert segmentations, without significant difference on the binary usability criterion. A consistent proximal-to-distal quality gradient was identified, with comparable scores at proximal arteries and the largest differences at distal arterial segments. Conclusion: nnUNet v2 with topology-aware training provides clinically usable cerebral artery segmentations on 3DRA, prospectively validated through both quantitative metrics and structured expert qualitative assessment, and represents a reproducible open-source foundation for endovascular and research applications.
Callet, C.; Bertrand, M.; Guzman, K.; Mece, P.; Rossi, E. A.; Grieve, K.
Show abstract
The retinal nerve fiber layer, composed of axon bundles converging toward the optic nerve, is a key biomarker for diagnosing and monitoring glaucoma and other neurodegenerative diseases. High-resolution en face imaging of individual nerve fiber bundles offers morphological information beyond what conventional optical coherence tomography provides, yet clinical integration remains limited by the lack of automated analysis tools and normative data. Here, we imaged 14 healthy volunteers using time-domain full-field optical coherence tomography and adaptive optics scanning laser ophthalmoscopy, and developed automated pipelines to quantify bundle width, trajectory, tortuosity, and orientation. Bundles were on average 25% wider at shallower retinal depths, width measurements were consistent across imaging modalities, and estimated axon count per bundle decreased significantly with age. Global trajectory analysis revealed systematic deviations of high resolution data from existing mathematical models, particularly in the temporal sector, leading us to propose two refined trajectory models. These normative results provide a foundation for high resolution biomarkers for use in investigations of retinal neurodegeneration.
Kurz, E.; Valli, G.; Meyer, T.; Proger, S.; Schwesig, R.; Bartels, T.; Delank, K.-S.; Sack, I.; Aghamiry, H. S.
Show abstract
Abstract Purpose: MyotonPRO (MTP) and time-harmonic elastography (THE) are increasingly used to assess muscle mechanical properties, yet they operate on fundamentally different physical principles. MTP measures composite MTP stiffness (N/m) through surface oscillations, while THE quantifies intrinsic shear modulus (THE stiffness, kPa) via propagating shear waves. This study aimed at systematically compare MTP and THE measurements in the vastus lateralis muscle across different contraction intensities and examine how the skin layer and subcutaneous fat (SLSF) thickness influence their relationship. Methods: Twenty-six healthy adults (15 males, 11 females; age 25 [SD 4] years) underwent MTP and THE measurements of the vastus lateralis at rest and during isometric contractions at 15% and 30% maximal voluntary contraction (MVC). Effects of contraction intensities on tissue properties were assessed using univariate analyses of variance with repeated measures. Associations between the different outcomes of THE and MTP technologies were explored using Pearson's correlations and partial correlation coefficients separately for each contraction intensity with adjustment of the SLSF thickness of participants. Results: Both technologies detected contraction intensity-dependent stiffening across all outcomes (p < 0.001). THE stiffness increased from 5.3 [1.2] kPa at rest to 15.6 [6.1] kPa at 30% MVC; THE wave attenuation increased from 0.83 [0.19] to 1.42 [0.36] s/m while MTP stiffness increased from 337.3 [49.3] N/m at rest to 529.4 [160.7] N/m at 30% MVC. Correlations between modalities were weak and condition-dependent. THE wave attenuation did not significantly correlate with any MTP outcome across conditions. Conclusion: MTP and THE detect contraction-induced stiffening through fundamentally different physical mechanisms and should not be regarded as interchangeable. Their correlation is modest at rest and breaks down (or reverses) during active contraction, with subcutaneous fat as a key modifying factor. Clinical trial number: Not applicable.
Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;
Show abstract
We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.
Sozol, S. S.; Dev Nath, B. C.; Fahim, F. M. S.; Suzana, N. N.; Mirza, J. F.; Ahmmed, S.; Zohra, F.-T.; Zafr, A. H. A.; Uddin, M. N.; Mondal, M. R. H.; Hoque, A. S. M. L.
Show abstract
Machine learning (ML) is being considered to help diagnose cardiovascular diseases (CVD). Still, challenges like inconsistent and limited datasets, limited infrastructure, and global inequalities lead to the need for a reliable and practicable ML solution. This paper presents an ML-driven framework for predicting CVD risk scores and classifying status. Several data preprocessing techniques, including multiple imputation by chained equations (MICE), outlier removal, are considered. In addition, hyperparameter tuning is performed with the GridSearchCV tuning technique. Moreover, a consensus-driven five-feature selection method is applied to identify optimal predictors. The dataset used in this study contains healthcare records related to future CVD risk scores, comprising 1,529 patient records with 22 features. The optimized stacked ensemble model is applied to the dataset and achieves a cross-validated coefficient of determination value of 98.13% for CVD risk score regression. Comparative evaluation with other ML models confirmed improved accuracy, efficiency, and interpretability. The explainable AI technique SHAP is applied to interpret predictions and highlight key risk factors. Moreover, a deployment-ready web platform with multi-role access has been developed that demonstrates clinical applicability. The proposed framework offers a reliable and interpretable tool for early detection of CVD and personalized risk assessment. In the future, this work can be extended to integrate longitudinal data, medical imaging, and deep learning to improve generalizability and strengthen real-world impact.
Hameed, S.; Henry, K.; Jiang, F.; Bhusal, B.; Dillenbeck, H.; Gakenheimer-Smith, L.; Webster, G.; Golestani Rad, L.
Show abstract
Pediatric patients with cardiac implantable electronic devices (CIEDs) face limited MRI access due to RF-induced heating, and computational modeling is increasingly used to characterize this risk. The validity of these simulations, however, depends on pairing body models with clinically realistic lead configurations, guidance that is currently lacking. We retrospectively analyzed 302 CIED surgeries in 281 pediatric patients to derive weight-based constraints for simulation design. Weight alone discriminated epicardial from endocardial lead implantation with AUC = 0.90, and adding age and height yielded no improvement, supporting weight as a sufficient single-parameter selection metric. The probabilistic crossover between approaches occurred at 44~kg, substantially higher than the 10 to 15~kg threshold commonly cited in the literature, with a broad transition zone of 21 to 66~kg in which both lead types were routinely used. Lead length was likewise weight-constrained: only 25~cm leads were observed in patients below 6~kg, and leads of 45~cm or longer were uncommon below 50~kg. These findings yield a three-tier framework, with epicardial-only configurations below 21~kg, dual configurations within 21 to 66~kg, and weight-thresholded lead lengths throughout, enabling MRI safety simulations to focus on clinically realizable anatomy and device combinations.
Lewis, A.; Arkam, F.; Steel, B.; Chen, E.; Singh, P.; Yakdan, S.; Becker, I.; Guo, W.; Shahrabani, A.; Payne, P. R.; Ghogawala, Z.; Steinmetz, M. P.; Neuman, B.; Ray, W. Z.; Duncan, R.; Greenberg, J.
Show abstract
Background Gait impairment is a central sign of cervical spondylotic myelopathy (CSM) that is typically evaluated through subjective patient-reported questionnaires or objective in-clinic measures. These systems require substantial resources to administer and are poorly suited for longitudinal monitoring, however, emerging smartphone applications present an efficient alternative. We developed and assessed the validity of a data processing framework based on the SynapTrack smartphone application to assess gait function in individuals with CSM. Methods Participants completed walking tasks which were recorded on both the SynapTrack app and a gold standard gait mat. Acceleration data extracted from the smartphone by the app were filtered and processed to produce gait cycle features including velocity, step time, waveform features and frequency domain features. Standard gait features were compared across the two methods by correlation and Bland-Altman plots to assess validity. App-based gait features were then compared to the standard modified Japanese Orthopedic Assessment (mJOA) assessment to determine construct validity through correlation and ability to discriminate between individuals with CSM and healthy controls. Finally, intraclass correlation coefficients and coefficients of variation were used to measure test-retest reliability and standard variation across app features. Results A total of 110 participants were included in this study, of which 55 (50%) had CSM, 24 (22%) had peripheral neuropathy, and 31 (28%) were healthy controls. SynapTrack gait measures including velocity, step time, and double support showed strong validity as indicated through Bland-Altman plots and high correlation (>0.8) with mat features. In addition to the gait features, acceleration root mean square, acceleration crest, spectral entropy, and dominant frequency showed strong construct validity compared to the mJOA across correlation (0.2-0.54), trend test (p < 0.001), and AUROC (0.62-0.79) analyses. ICCs showed moderate test-retest reliability (0.52-0.67). Discussion The proposed framework for processing gait data showed strong validity compared to the gold standard mat and high construct validity compared to the mJOA suggesting the utility of the SynapTrack app as an efficient alternative to existing methods. The confirmation of gait metrics related to CSM severity and identification of relevant waveform and frequency domain features present opportunities to use smartphone apps to develop ecologically valid data driven markers of CSM severity.
Kurt, F.; Subasi, A.
Show abstract
Background: Traditional diagnostic models lack explainability, while multimodal language models prone to hallucination remain unsafe for medical education. An interactive, risk-free artificial intelligence framework is required to serve as a reliable clinical mentor for radiology trainees. Methods: We propose a multi-agent architecture decoupling deterministic image analysis from generative consultation. Specialized computer vision models perform anatomical localization and pathological segmentation. These quantitative outputs are synthesized into a structured payload, which grounds a locally hosted large language model (LLaVA 7B) using strict prompt guardrails and prerequisite protocols. Results: The system effectively eliminates visual hallucinations by intercepting unanchored queries. The artificial intelligence tutor successfully contextualizes spatial anomalies and baseline metrics, generating accurate conversational explanations and formally structured radiology reports while strictly enforcing medical safety disclaimers. Discussion and Conclusion: By anchoring language generation exclusively to verified algorithmic realities, this framework transforms opaque diagnostic models into safe, interactive educational simulators. This establishes a highly reliable paradigm for integrating explainable artificial intelligence into medical training.
Yakdan, S.; Singh, P.; Arkam, F.; Chen, E.; Lewis, A.; Steel, B.; Becker, I.; Guo, W.; Naveed, H.; Wang, C.; Yang, D.; Wang, Z.; Ray, W. Z.; Hassenstab, J.; Steinmetz, M. P.; Ghogawala, Z.; Kelleher, C.; Greenberg, J.
Show abstract
Background and Objectives: Cervical spondylotic myelopathy (CSM) is a leading cause of neurological disability in older adults. However, validated, scalable tools to quantify disease severity and changes over time are lacking. Recent advances in smartphone technology have opened new avenues for longitudinal, objective, and remote monitoring of neurological conditions. We performed a preliminary evaluation of the reliability and validity of SynapTrack, a smartphone-based digital platform for objective remote CSM assessments. Methods: In this single-center prospective cohort study, 265 participants (151 with CSM, 114 healthy controls) completed in-person SynapTrack assessments related to tapping, pinching, and vibratory detection, along with reference laboratory measures of dexterity (Box and Block Test, 9-Hole Peg Test) and vibratory sensation (tuning fork). A subset completed repeated home-based testing to assess test-retest reliability. We evaluated convergent validity, construct validity against the modified Japanese Orthopedic Association (mJOA) score, known-groups validity, and test-retest reliability (intraclass correlation coefficient, ICC). Results: Smartphone-derived metrics demonstrated good-to-excellent test-retest reliability, with the strongest stability for vibratory detection threshold (ICC = 0.92), overall and non-dominant tapping speed (ICC = 0.90 each), and pinching successful targets (ICC = 0.90). Convergent validity was supported by moderate-to-strong correlations between digital metrics and reference laboratory dexterity tests ({rho} up to 0.60 for tapping speed; up to -0.65 for the vibratory threshold). Construct validity against the mJOA was strongest for the vibratory threshold ({rho} = -0.53 to -0.54) and Level 2 non-dominant pinching errors ({rho} = -0.45). Selected metrics distinguished CSM patients from controls with good discrimination, including non-dominant tapping speed (AUROC = 0.76, 95% CI 0.68-0.85), Level 2 dominant pinching successful targets (AUROC = 0.78, 95% CI 0.62-0.94), and the non-dominant vibratory threshold (AUROC = 0.77, 95% CI 0.64-0.90). Conclusions and Relevance: A smartphone-based battery of upper-extremity sensorimotor tasks demonstrated preliminary reliability and validity in CSM. Furthermore, to our knowledge, the novel vibratory detection task represents the first smartphone-based sensory assessment used for CSM. Collectively, these findings position SynapTrack as a scalable platform for objective, remote neurological monitoring of CSM.
Bender, J.; Stoks, J.; Barrios Espinosa, C.; Becker, S.; Cluitmans, M. J. M.; Loewe, A.
Show abstract
Background and Aims: Clinical interpretation of the precordial leads V1-V6 assumes that Wilson's central terminal (WCT) has a fixed anatomical location. Consequently, a positive signal corresponds to electrical activation spreading from WCT towards the respective electrode, and vice versa. However, the location of WCT has never been systematically investigated. Yet, a better understanding of WCT location could improve the interpretation of the precordial leads. This work aims to characterize the spatial expansion and location of the physical WCT i.e., the electrical potential defined by the WCT, during the P-wave on the body surface. Methods: An intensive analysis of body surface potential maps (BSPMs) during atrial depolarization in an in silico patient cohort and clinical data was conducted. Results: During the P-wave, the location of WCT was not stationary but the spatial extent and location varied across time as well as across individuals. Four distinct spatial patterns of WCT distribution on the body surface were identified in silico, and three of these were found in the clinical cohort. WCT signals agreed with BSPM signals at commonly assumed positions of WCT only for a small fraction of the P-wave. Conclusion: The spatial extension and location of WCT changes during the P-wave and thus should be considered when interpreting the precordial leads.
Tang, G.; Li, X.; Xiao, Y.; Wang, K.; Wu, M.; Wei, Z.; Yu, M.; Chen, X.; Hong, W.; Cheng, F.; Li, X.; Zhang, J.; Wu, X.; Hong, S.
Show abstract
Hypokalemia is a common and potentially life-threatening electrolyte abnormality in emergency care, yet rapid noninvasive screening remains difficult in time-critical triage settings. We developed PocketED-K, a single-lead AI-ECG prescreening model initialized from ECGFounder, and evaluated it in retrospective multicenter cohorts and a prospective handheld pilot. Retrospective development and validation included 37,115 patients from MC-MED and MIMIC-ED, and the pilot enrolled 18 patients at Peking University First Hospital. Hypokalemia was defined as venous serum potassium < 3.5 mmol/L. PocketED-K achieved AUROCs of 0.8189 (95% CI 0.8172--0.8207) in internal testing, 0.8104 (95% CI 0.8092--0.8115) in temporal validation, and 0.7889 (95% CI 0.7692--0.8074) in independent external validation; external negative predictive value was 0.9911 (95% CI 0.9895--0.9925). Higher predicted risk was associated with ST-segment depression, T-wave flattening or inversion, and relative U-wave prominence. The prospective handheld pilot provided an initial signal of workflow feasibility in real-world acquisition. These findings support single-lead AI-ECG as a low-burden prescreening tool to prioritize potassium testing in emergency care.
Elemento, O.; Sigaras, A.; Colonel, J.; Hajirasouliha, I.; Ghosh, S.; Bensoussan, Y.; Bridge2AI-Voice Consortium, ; Rameau, A.
Show abstract
Vocal biomarkers, encompassing voice and speech, have largely been developed for individual conditions in isolation, limiting their generalizability across diseases and recording settings. To address this, we introduce VoiceFM, a contrastive model that learns general-purpose clinical voice representations by aligning audio embeddings with rich clinical metadata. Using the Bridge2AI-Voice dataset (984 primarily English-speaking adult participants, 846 used for training and 138 held out as a temporally separated validation cohort, 40,056 recordings totaling 176 hours across 5 academic medical centers), VoiceFM pairs a fine-tuned Whisper large-v2 encoder with a tabular transformer over 44 clinical features via symmetric InfoNCE loss. Linear probes on frozen VoiceFM embeddings achieve mean AUROC 0.952 +/- 0.005 across five evaluation tasks (control vs disease screening plus four disease categories), significantly outperforming Frozen Whisper (0.926 +/- 0.013, p = 0.013), Frozen HuBERT (0.885 +/- 0.017, p = 0.0009), and the contrastively trained VoiceFM-HuBERT (0.938 +/- 0.006, p = 0.012). On the 138-participant held-out cohort, VoiceFM-Whisper achieves AUROCs of 0.99 for Alzheimer's/dementia/MCI and 0.89 for airway stenosis, demonstrating that the learned representations generalize to participants the model has never seen. VoiceFM representations transfer to three external datasets without retraining and improve few-shot classification. Recording task attribution identifies a small set of speech tasks that match or exceed the full battery's performance, suggesting shorter screening protocols are feasible. Trained predominantly on English audio, VoiceFM transfers without fine-tuning to Spanish-language Parkinson's disease (PD) detection (NeuroVoz, 107 participants, AUROC 0.93 +/- 0.02), with the signal dominated by articulatory rather than phonatory features. A fine-tuned classifier achieves participant-level AUROC 0.87 (sustained 0.85, countdown 0.80) on the mPower smartphone study (585 held-out participants). Together, these results show that contrastive alignment between voice and rich clinical metadata can serve as the basis for a clinical voice foundation model, producing a single set of transferable representations that generalize across diseases, languages, recording conditions, and patients enrolled after model freeze.